Trainable Tree Distance and an Application to Question Categorisation
نویسنده
چکیده
Continuing a line of work initiated in Boyer et al. (2007), the generalisation of stochastic string distance to a stochastic tree distance, specically to stochastic Tai distance, is considered. An issue in modifying Zhang/Shasha tree-distance for stochastic variants is noted, a Viterbi EM costadaptation algorithm for this distance is proposed and a counter-example noted to an all-paths EM proposal. Experiments are reported in which a kNN categorisation algorithm is applied to a semantically categorised, syntactically annotated corpus. We show that a 67.7% base-line using standard unitcosts can be improved to 72.5% by cost adaptation. 1 Theory and Algorithms The classification of syntactic structures into semantic categories arises in a number of settings. A possible approach to such a classifier is to compute a category for a test item based on its distances to a set of k nearest neighbours in a precategorised example set. This paper takes such an approach and deploying variants of a tree-distance measure, a measure which has been used with some success in a variety of semantically-oriented tasks such as Question-Answering, Entailment Recognition and Semantic Role Labelling (Punyakanok et al., 2004; Kouylekov and Magnini, 2005; Emms, 2006a; Emms, 2006b; Franco-Penya, 2010). An issue which will be considered is how to adapt the atomic costs underlying the tree-distance measure. Tai (1979) first proposed a tree-distance measure. Where S and T are ordered, labelled trees, a Tai mapping is a partial, 1-to-1 mapping σ from the nodes of S to the nodes of T , which respects leftto-right order and ancestry1, such as a
منابع مشابه
Automatic utterance type detection using suprasegmental features
The goal of the work presented here is to automatically predict the type of an utterance in spoken dialogue by using automatically extracted suprasegmental information. For this task, we present and compare three stochastic algorithms: hidden Markov models, artificial neural nets, and classification and regression trees. These models are easily trainable, reasonably robust and fit into the prob...
متن کاملAdapting Tree Distance to Answer Retrieval and Parser Evaluation
The results of experiments on the application of tree-distance to an answer-retrieval task are reported. Various parameters in the definitions of tree-distance are considered, including wholevs-sub tree, node weighting, wild cards and lexical emphasis. The results show that improving parse-quality maps to improved performance on this tree-distance answer-retrieval task. It also shown that one o...
متن کاملVariants Of Tree Similarity In A Question Answering Task
The results of experiments on the application of a variety of distance measures to a question-answering task are reported. Variants of tree-distance are considered, including whole-vs-sub tree, node weighting, wild cards and lexical emphasis. We derive string-distance as a special case of tree-distance and show that a particular parameterisation of tree-distance outperforms the string-distance ...
متن کاملClustering by Tree Distance for Parse Tree Normalisation
The application of tree-distance to clustering is considered. Previous work identified some parameters which favourably affect the use of tree-distance in question-answering tasks. Some evidence is given that the same parameters favourably affect the cluster quality. A potential application is in the creation of systems to carry out transformation of interrogative to indicative sentences, a fir...
متن کاملWeb Categorisation Using Distance-Based Decision Trees
In Web classification, web pages are assigned to pre-defined categories mainly according to their content (content mining). However, the structure of the web site might provide extra information about their category (structure mining). Traditionally, both approaches have been applied separately, or are dealt with techniques that do not generate a model, such as Bayesian techniques. Unfortunatel...
متن کامل